This tutorial introduces Structural Equation Modelling (SEM) — a powerful and flexible family of multivariate statistical techniques that allows researchers to simultaneously model multiple relationships among variables, account for measurement error, and test theories about constructs that cannot be directly observed. Where simple linear regression models a single outcome from one or more predictors, SEM can model entire systems of relationships, including situations where the same variable acts as both a predictor and an outcome, and where some of the most important variables in a theory are not directly measurable at all.
SEM is particularly well suited to the language sciences. Much of what linguists and applied linguists care about — language anxiety, motivation, metalinguistic awareness, communicative competence, reading ability — cannot be captured in a single measurement. These are latent constructs: theoretical entities that we infer indirectly from a set of observable indicators such as questionnaire items or test scores. SEM provides a principled framework for doing exactly this, and then for examining how these latent constructs relate to one another and to observable outcomes.
SEM is increasingly recognised as a valuable tool in corpus linguistics and cognitive linguistics. Larsson, Plonsky, and Hancock (2021) make the case that path models — a fundamental building block of SEM — are well suited to the multivariate nature of corpus-linguistic data, enabling researchers to move beyond monofactorial analyses and test theoretically motivated causal structures. Fuoli (2022) provides a step-by-step introduction to SEM in R for linguists working in a cognitive-linguistic framework, demonstrating its utility for modelling the psychological effects of linguistic choices. Rosseel (2012)’s lavaan package, which we use throughout this tutorial, has made full-featured SEM freely available in R.
This tutorial is aimed at beginners with no prior exposure to SEM. You do not need to have studied factor analysis or path analysis before, though familiarity with basic regression is helpful. The goal is to build conceptual understanding from the ground up and to equip you with the practical skills to fit, evaluate, and report SEM models in R.
Learning Objectives
By the end of this tutorial you will be able to:
Explain the distinction between observed and latent variables and describe why measurement error matters
Identify the two building blocks of a full SEM — the measurement model and the structural model — and describe what each specifies
Read and interpret a standard SEM path diagram
Specify a Confirmatory Factor Analysis (CFA) in lavaan model syntax
Evaluate a CFA using model fit indices (CFI, TLI, RMSEA, SRMR) and reliability coefficients (McDonald’s ω)
Extend a measurement model to a full SEM by adding structural paths
Interpret standardised path coefficients and R² values from a full SEM
Test mediation hypotheses using labelled paths and bootstrapped confidence intervals
Compare nested and non-nested SEM specifications using Δχ², AIC, and BIC
Use modification indices responsibly to diagnose model misfit
Report SEM results in accordance with current best-practice conventions in linguistics and applied linguistics
Prerequisite Tutorials
Before working through this tutorial, we recommend familiarity with the following:
Martin Schweinberger. 2026. Structural Equation Modelling in R. The Language Technology and Data Analysis Laboratory (LADAL), The University of Queensland, Australia. url: https://ladal.edu.au/tutorials/sem/sem.html (Version 2026.03.28).
library(lavaan) # SEM and CFA estimationlibrary(semPlot) # path diagram visualisationlibrary(semTools) # reliability and model comparison toolslibrary(psych) # descriptive statistics and correlation matriceslibrary(dplyr) # data manipulationlibrary(ggplot2) # data visualisationlibrary(tidyr) # data reshapinglibrary(flextable) # formatted tableslibrary(checkdown) # interactive quiz questions
The Dataset
Throughout this tutorial we use a simulated dataset inspired by research on second-language (L2) writing. The data represent 300 university students who completed a battery of questionnaire scales and an academic writing task. The dataset includes:
Language Anxiety (anx1–anx3): three Likert-scale items measuring the degree to which students feel anxious when writing in their L2 (higher = more anxious)
Writing Self-Efficacy (eff1–eff3): three items measuring students’ confidence in their L2 writing ability (higher = greater self-efficacy)
Motivation (mot1–mot3): three items measuring students’ intrinsic motivation to improve their L2 writing (higher = more motivated)
Writing Score (writing_score): a holistic score (0–100) assigned by trained raters to an in-class academic writing task
Because the data are simulated in R, no external file is needed — you can reproduce the entire analysis from the code below.
What you will learn: The core ideas underpinning SEM — latent variables, measurement error, path diagrams, and the two-component structure of a full SEM.
Why it matters: SEM notation and vocabulary are quite different from ordinary regression. Building a solid conceptual foundation before fitting models prevents common misinterpretations.
Observed vs. latent variables
A fundamental distinction in SEM is between variables you can observe directly and those you cannot.
Observed (manifest) variables are things you actually measure and record: a Likert-scale item, a test score, a reaction time, a corpus frequency count. They appear as columns in your dataset.
Latent variables are theoretical constructs that you cannot measure directly. Language anxiety, motivation, and writing self-efficacy are classic examples from applied linguistics. No single questionnaire item perfectly captures any of these constructs — each item is merely a fallible indicator. Latent variables are never columns in your dataset; instead, they are modelled as common causes of their observed indicators.
This distinction matters because measurement error is unavoidable whenever we use observed items to represent theoretical constructs. If we ignore this error — for example, by averaging questionnaire items and treating the result as if it were the true construct — we introduce attenuation bias into our estimates of relationships. SEM addresses this explicitly: it partitions the variance in each observed indicator into a part explained by the underlying latent variable and a part attributed to unique error (random noise plus any systematic variance not shared with the other indicators). As Larsson, Plonsky, and Hancock (2021) note, treating latent variables such as motivation and proficiency as observed (e.g., by using composite scores) leads to underestimation of relationships, which is one of the key arguments for using SEM in language research.
The two building blocks of SEM
A full structural equation model is composed of two sub-models:
Sub-model
Technical name
What it specifies
Measurement model
Confirmatory Factor Analysis (CFA)
Which observed items are indicators of which latent variables; how strongly each item loads onto its construct; how much unique error each item has
Structural model
Path model
Directional relationships among latent variables (and between latent variables and observed outcomes); regression-like paths encoding theoretical predictions
In the standard two-step approach to SEM (Anderson and Gerbing 1988), researchers first establish an adequate measurement model (Step 1) before testing the structural paths of theoretical interest (Step 2). This tutorial follows this workflow: we build and evaluate a CFA in Section 3 and then add structural paths in Section 5.
Path diagrams
SEM models are almost always communicated visually through path diagrams. The notation is standardised:
Symbol
Represents
Rectangle
Observed (manifest) variable
Oval / ellipse
Latent variable
Single-headed arrow (→)
Directional path (a regression-type effect)
Double-headed curved arrow (↔︎)
Covariance or correlation
Small arrow into rectangle
Residual / unique error for that indicator
Small arrow into oval
Disturbance (residual error for an endogenous latent variable)
In a measurement model, ovals point to rectangles: the latent construct is hypothesised to cause variation in its observed indicators. In a structural model, ovals point to other ovals, encoding directional theoretical predictions among constructs.
SEM is a confirmatory, theory-driven method
Unlike Exploratory Factor Analysis (EFA), which discovers factor structure empirically from the data, SEM requires the researcher to specify the model in advance based on theory. Every path in the diagram — every arrow that is included or excluded — reflects a theoretical decision. A good model fit indicates that the specified model is consistent with the data; it does not prove the model is the only correct one. Alternative models that fit equally well are always possible (this is the problem of equivalent models). Always ground your SEM specifications in theory, not post-hoc data exploration (Kline 2023).
A conceptual map of our example
Our theoretical model for the L2 writing dataset can be described as follows:
Language Anxiety, Writing Self-Efficacy, and Motivation are latent constructs, each measured by three questionnaire items.
We expect Self-Efficacy and Anxiety to have opposite effects on Writing Score: greater self-efficacy should improve performance; greater anxiety should impair it.
Self-Efficacy is also expected to influence Motivation (students who feel more capable tend to be more motivated), and Motivation may in turn have a positive effect on Writing Score. This indirect path constitutes a mediation hypothesis.
This conceptual model drives all the analytic choices that follow.
Descriptive Statistics and Correlations
Section Overview
What you will learn: How to examine the observed variables before fitting any model.
Key steps: Descriptive statistics, distribution checks, inter-item correlations.
Before fitting any SEM, it is good practice to examine the distributions and inter-relationships of your observed variables. Severe non-normality or implausible correlations can signal problems that need to be addressed before modelling.
All items are centred near zero (as expected for standardised simulated data). Skewness values are within the acceptable range of [−1, +1] for all items, meaning that the normality assumption required for maximum likelihood estimation in lavaan(Rosseel 2012) is not substantially violated.
Correlation matrix
A correlation matrix helps us verify that items within the same scale correlate with each other (convergent evidence) and that items from different scales correlate less strongly (discriminant evidence).
Code
cor_mat <-cor(semdata |> dplyr::select(-writing_score)) |>round(2)cor_mat |>as.data.frame() |> tibble::rownames_to_column("Variable") |>flextable() |> flextable::set_table_properties(width = .99, layout ="autofit") |> flextable::theme_zebra() |> flextable::fontsize(size =10) |> flextable::fontsize(size =10, part ="header") |> flextable::align_text_col(align ="center") |> flextable::set_caption(caption ="Pearson correlation matrix for the nine questionnaire items.") |> flextable::border_outer()
Variable
anx1
anx2
anx3
eff1
eff2
eff3
mot1
mot2
mot3
anx1
1.00
0.56
0.57
0.03
-0.04
0.00
0.01
-0.03
-0.01
anx2
0.56
1.00
0.56
0.03
-0.07
0.00
0.03
-0.01
-0.01
anx3
0.57
0.56
1.00
0.01
-0.04
-0.01
0.05
-0.02
0.07
eff1
0.03
0.03
0.01
1.00
0.60
0.58
0.30
0.36
0.32
eff2
-0.04
-0.07
-0.04
0.60
1.00
0.53
0.24
0.31
0.27
eff3
0.00
0.00
-0.01
0.58
0.53
1.00
0.23
0.29
0.22
mot1
0.01
0.03
0.05
0.30
0.24
0.23
1.00
0.53
0.50
mot2
-0.03
-0.01
-0.02
0.36
0.31
0.29
0.53
1.00
0.56
mot3
-0.01
-0.01
0.07
0.32
0.27
0.22
0.50
0.56
1.00
Code
cor_long <- cor_mat |>as.data.frame() |> tibble::rownames_to_column("Var1") |> tidyr::pivot_longer(-Var1, names_to ="Var2", values_to ="r")ggplot(cor_long, aes(x = Var1, y = Var2, fill = r)) +geom_tile(color ="white") +geom_text(aes(label =round(r, 2)), size =3.2) +scale_fill_gradient2(low ="tomato", mid ="white", high ="steelblue",midpoint =0, limits =c(-1, 1), name ="r") +theme_bw() +theme(axis.text.x =element_text(angle =45, hjust =1),panel.grid =element_blank()) +labs(title ="Correlation heatmap: nine questionnaire items",x ="", y ="")
The heatmap confirms the expected pattern: items within each scale (e.g., anx1–anx3) correlate strongly with each other and more weakly with items from the other scales. The efficacy and motivation items show moderate cross-scale correlations, consistent with our theoretical expectation that the two constructs are related.
Confirmatory Factor Analysis (CFA)
Section Overview
What you will learn: How to specify, fit, and evaluate a measurement model using CFA in lavaan.
Key concepts: Factor loadings, model fit indices, reliability, convergent and discriminant validity.
Why CFA before SEM: The measurement model must be established before structural paths are meaningful. If your indicators do not adequately reflect the intended latent constructs, the structural estimates will be uninterpretable.
What is Confirmatory Factor Analysis?
Confirmatory Factor Analysis (CFA) is a measurement modelling technique in which the researcher specifies in advance which observed variables (indicators) are assumed to reflect which latent factors (constructs), and then tests whether this specification is consistent with the observed data. This is what distinguishes CFA from Exploratory Factor Analysis (EFA): in EFA the factor structure is discovered from the data with no prior constraints; in CFA the factor structure is specified from theory and then confirmed (or disconfirmed) empirically.
In our example, we hypothesise three latent factors:
Anxiety (ANX), indicated by anx1, anx2, anx3
Self-Efficacy (EFF), indicated by eff1, eff2, eff3
Motivation (MOT), indicated by mot1, mot2, mot3
Specifying a CFA model in lavaan
The lavaan package (Rosseel 2012) uses a simple, readable model syntax. The key operator for defining a measurement model is =~ which is read as “is measured by” or “is indicated by”:
By default, lavaan fixes the first indicator loading to 1.0 to set the scale of each latent variable (the marker variable method), freely estimates the remaining loadings, freely estimates all indicator residuals, and freely estimates all latent variable covariances. You can change these defaults using arguments to cfa() or sem().
Fitting the CFA model
We fit the model using lavaan::cfa(). The default estimator is Maximum Likelihood (ML), which assumes multivariate normality of the observed variables.
lavaan 0.6-21 ended normally after 29 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 21
Number of observations 300
Model Test User Model:
Test statistic 12.806
Degrees of freedom 24
P-value (Chi-square) 0.969
Model Test Baseline Model:
Test statistic 856.110
Degrees of freedom 36
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 1.000
Tucker-Lewis Index (TLI) 1.020
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -3379.871
Loglikelihood unrestricted model (H1) -3373.468
Akaike (AIC) 6801.742
Bayesian (BIC) 6879.521
Sample-size adjusted Bayesian (SABIC) 6812.922
Root Mean Square Error of Approximation:
RMSEA 0.000
90 Percent confidence interval - lower 0.000
90 Percent confidence interval - upper 0.000
P-value H_0: RMSEA <= 0.050 1.000
P-value H_0: RMSEA >= 0.080 0.000
Standardized Root Mean Square Residual:
SRMR 0.023
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
ANX =~
anx1 1.000 0.770 0.754
anx2 0.937 0.090 10.447 0.000 0.721 0.742
anx3 0.965 0.092 10.479 0.000 0.743 0.757
EFF =~
eff1 1.000 0.833 0.824
eff2 0.923 0.082 11.253 0.000 0.768 0.733
eff3 0.825 0.075 10.992 0.000 0.687 0.706
MOT =~
mot1 1.000 0.664 0.677
mot2 1.134 0.116 9.768 0.000 0.753 0.782
mot3 1.031 0.108 9.571 0.000 0.685 0.716
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
ANX ~~
EFF -0.004 0.046 -0.095 0.924 -0.007 -0.007
MOT 0.003 0.038 0.093 0.926 0.007 0.007
EFF ~~
MOT 0.291 0.049 5.901 0.000 0.526 0.526
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.anx1 0.449 0.059 7.658 0.000 0.449 0.431
.anx2 0.424 0.053 7.990 0.000 0.424 0.449
.anx3 0.412 0.054 7.583 0.000 0.412 0.427
.eff1 0.328 0.054 6.081 0.000 0.328 0.321
.eff2 0.508 0.058 8.695 0.000 0.508 0.463
.eff3 0.475 0.051 9.277 0.000 0.475 0.502
.mot1 0.521 0.056 9.266 0.000 0.521 0.542
.mot2 0.359 0.054 6.701 0.000 0.359 0.388
.mot3 0.447 0.053 8.464 0.000 0.447 0.488
ANX 0.592 0.089 6.629 0.000 1.000 1.000
EFF 0.693 0.092 7.551 0.000 1.000 1.000
MOT 0.441 0.076 5.835 0.000 1.000 1.000
This output contains three major sections: model fit information, factor loadings (both unstandardised and standardised), and latent variable covariances.
Interpreting factor loadings
Factor loadings express how strongly each indicator is related to its underlying latent variable. In the standardised solution (column Std.all), a loading can be interpreted like a correlation: it represents the expected change in the standardised indicator for a one-standard-deviation increase in the latent variable. Standardised loadings above 0.50 are generally considered acceptable; loadings above 0.70 are considered strong (Hair et al. 2019).
Code
loadings_df <- lavaan::standardizedsolution(cfa_fit) |> dplyr::filter(op =="=~") |> dplyr::select(Latent = lhs, Indicator = rhs,Std_Loading = est.std, SE = se,z = z, p = pvalue) |> dplyr::mutate(across(where(is.numeric), ~round(.x, 3)))loadings_df |>flextable() |> flextable::set_table_properties(width = .85, layout ="autofit") |> flextable::theme_zebra() |> flextable::fontsize(size =11) |> flextable::fontsize(size =11, part ="header") |> flextable::align_text_col(align ="left") |> flextable::set_caption(caption ="Standardised CFA factor loadings with standard errors and significance tests.") |> flextable::border_outer()
Latent
Indicator
Std_Loading
SE
z
p
ANX
anx1
0.754
0.038
19.668
0
ANX
anx2
0.742
0.039
19.193
0
ANX
anx3
0.757
0.038
19.772
0
EFF
eff1
0.824
0.033
24.642
0
EFF
eff2
0.733
0.037
19.838
0
EFF
eff3
0.706
0.038
18.458
0
MOT
mot1
0.677
0.042
16.071
0
MOT
mot2
0.782
0.038
20.464
0
MOT
mot3
0.716
0.041
17.666
0
All standardised loadings should exceed 0.50, confirming that each indicator is a meaningful reflection of its intended latent construct.
Model fit assessment
Fitting a CFA does not automatically produce a good model. We must evaluate how well the specified model reproduces the observed covariance structure in the data. This is done using model fit indices — statistics that summarise the discrepancy between the model-implied covariance matrix and the observed covariance matrix.
Model fit indices: what they mean and which cut-offs to use
No single fit index is sufficient. Report a combination of the following:
Index
Full name
What it measures
Acceptable
Good
χ²
Chi-square test
Overall model misfit (sensitive to N)
p > .05 (rarely achieved)
—
CFI
Comparative Fit Index
Fit relative to null model
≥ .90
≥ .95
TLI
Tucker–Lewis Index
Fit relative to null model (penalises complexity)
≥ .90
≥ .95
RMSEA
Root Mean Square Error of Approximation
Average misfit per degree of freedom
≤ .08
≤ .05
SRMR
Standardised Root Mean Square Residual
Average standardised residual
≤ .08
≤ .05
Cut-offs are from Hu and Bentler (1999). These are guidelines, not hard thresholds — model fit must always be evaluated in the context of model complexity and sample size (Kline 2023).
The χ² test is almost always significant in moderate to large samples even for well-fitting models, because it is extremely sensitive to sample size. It is therefore standard practice to rely on the incremental and approximate fit indices (CFI, TLI, RMSEA, SRMR) rather than on χ² alone (Fuoli 2022).
Beyond model fit, we assess whether each scale is internally consistent — that is, whether the indicators of each latent variable reliably hang together. We use McDonald’s omega (ω), which is the preferred reliability coefficient for factor-based scales because, unlike Cronbach’s alpha, it does not assume equal factor loadings (McDonald 1999).
Each oval represents a latent variable; each rectangle an observed indicator. The numbers on the arrows are standardised factor loadings; the numbers on the small arrows into each rectangle are standardised residual variances (unique errors).
Exercises: CFA
Q1. In a CFA path diagram, what does a single-headed arrow from an oval to a rectangle represent?
Q2. A CFA model returns CFI = .88 and RMSEA = .09. What is the most appropriate conclusion?
Q3. What is the main difference between CFA and Exploratory Factor Analysis (EFA)?
Full Structural Equation Model
Section Overview
What you will learn: How to extend a CFA measurement model by adding directional structural paths between latent variables and outcomes.
Once we are satisfied with the measurement model, we add the structural paths — the directional hypotheses about how the latent variables relate to each other and to the writing score outcome. Our theoretical model predicts:
Anxiety → Writing Score (negative effect: more anxious students perform worse)
Self-Efficacy → Writing Score (positive effect)
Self-Efficacy → Motivation (positive effect: more efficacious students are more motivated)
Motivation → Writing Score (positive effect)
Path (3) combined with path (4) constitutes an indirect effect of Self-Efficacy on Writing Score through Motivation — a mediation hypothesis examined in Section 6.
Specifying the full SEM
In lavaan, structural paths are specified using the ~ operator, which is read as “is regressed on”:
Outcome ~ Predictor
We combine the measurement model with the structural paths in a single model string:
Exogenous variables have no incoming arrows (they are only predictors, never outcomes). In our model, ANX and EFF are exogenous latent variables.
Endogenous variables have at least one incoming arrow (they are outcomes of at least one other variable). MOT and writing_score are endogenous.
Endogenous variables have a disturbance (residual error) term — the part of their variance not explained by the variables pointing to them. lavaan estimates disturbances automatically.
Fitting the full SEM
We fit the full SEM using lavaan::sem(). The syntax is identical to cfa() but with the full model specification:
lavaan 0.6-21 ended normally after 56 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 24
Number of observations 300
Model Test User Model:
Test statistic 16.790
Degrees of freedom 31
P-value (Chi-square) 0.982
Model Test Baseline Model:
Test statistic 1250.761
Degrees of freedom 45
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 1.000
Tucker-Lewis Index (TLI) 1.017
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -4417.659
Loglikelihood unrestricted model (H1) -4409.264
Akaike (AIC) 8883.317
Bayesian (BIC) 8972.208
Sample-size adjusted Bayesian (SABIC) 8896.094
Root Mean Square Error of Approximation:
RMSEA 0.000
90 Percent confidence interval - lower 0.000
90 Percent confidence interval - upper 0.000
P-value H_0: RMSEA <= 0.050 1.000
P-value H_0: RMSEA >= 0.080 0.000
Standardized Root Mean Square Residual:
SRMR 0.023
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
ANX =~
anx1 1.000 0.765 0.749
anx2 0.942 0.084 11.160 0.000 0.721 0.742
anx3 0.978 0.086 11.339 0.000 0.748 0.762
EFF =~
eff1 1.000 0.809 0.801
eff2 0.973 0.072 13.596 0.000 0.787 0.751
eff3 0.859 0.067 12.790 0.000 0.695 0.714
MOT =~
mot1 1.000 0.672 0.685
mot2 1.087 0.106 10.269 0.000 0.730 0.759
mot3 1.044 0.103 10.096 0.000 0.702 0.733
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
MOT ~
EFF 0.435 0.065 6.695 0.000 0.524 0.524
writing_score ~
ANX -7.257 0.795 -9.126 0.000 -5.550 -0.376
EFF 12.811 1.001 12.801 0.000 10.365 0.702
MOT 5.220 1.060 4.926 0.000 3.507 0.238
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
ANX ~~
EFF -0.006 0.044 -0.145 0.885 -0.010 -0.010
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.anx1 0.457 0.054 8.528 0.000 0.457 0.439
.anx2 0.424 0.049 8.699 0.000 0.424 0.450
.anx3 0.404 0.049 8.228 0.000 0.404 0.420
.eff1 0.367 0.040 9.051 0.000 0.367 0.359
.eff2 0.478 0.048 10.006 0.000 0.478 0.435
.eff3 0.464 0.044 10.480 0.000 0.464 0.490
.mot1 0.511 0.054 9.499 0.000 0.511 0.531
.mot2 0.393 0.049 7.982 0.000 0.393 0.424
.mot3 0.423 0.049 8.586 0.000 0.423 0.462
.writing_score 28.004 5.613 4.989 0.000 28.004 0.128
ANX 0.585 0.086 6.836 0.000 1.000 1.000
EFF 0.655 0.082 7.938 0.000 1.000 1.000
.MOT 0.328 0.058 5.651 0.000 0.726 0.726
Structural path estimates
Code
sem_paths_df <- lavaan::standardizedsolution(sem_fit) |> dplyr::filter(op =="~") |> dplyr::select(Outcome = lhs, Predictor = rhs,Std_Estimate = est.std, SE = se,z = z, p = pvalue) |> dplyr::mutate(across(where(is.numeric), ~round(.x, 3)),Sig = dplyr::case_when( p < .001~"***", p < .01~"**", p < .05~"*",TRUE~"" ) )sem_paths_df |>flextable() |> flextable::set_table_properties(width = .90, layout ="autofit") |> flextable::theme_zebra() |> flextable::fontsize(size =11) |> flextable::fontsize(size =11, part ="header") |> flextable::align_text_col(align ="left") |> flextable::set_caption(caption ="Standardised structural path coefficients from the full SEM.") |> flextable::border_outer()
Outcome
Predictor
Std_Estimate
SE
z
p
Sig
MOT
EFF
0.524
0.058
8.989
0
***
writing_score
ANX
-0.376
0.038
-10.003
0
***
writing_score
EFF
0.702
0.041
17.174
0
***
writing_score
MOT
0.238
0.045
5.234
0
***
Standardised path coefficients can be interpreted similarly to standardised regression coefficients (β): they indicate the expected change in the outcome (in standard deviation units) for a one-standard-deviation increase in the predictor, holding all other predictors constant.
Q1. In the lavaan model syntax, what does the ~ operator specify?
Q2. A standardised structural path coefficient of β = −0.42 (p < .001) from Anxiety to Writing Score means:
Mediation Analysis
Section Overview
What you will learn: How to test mediation hypotheses — indirect effects of one variable on another via a third — within an SEM framework.
Key concepts: Direct effects, indirect effects, total effects, bootstrapped confidence intervals.
What is mediation?
Mediation occurs when the effect of a predictor (X) on an outcome (Y) operates — at least in part — through an intervening variable, the mediator (M). Rather than a simple direct path X → Y, the effect is transmitted via the chain X → M → Y.
In our example, the theoretical mediation hypothesis is:
Self-Efficacy (EFF) influences Writing Score both directly and indirectly by increasing Motivation (MOT), which in turn improves Writing Score.
This decomposes the total effect of Self-Efficacy on Writing Score into a direct effect (EFF → writing_score), an indirect effect via Motivation (EFF → MOT → writing_score), and the total effect (direct + indirect).
Specifying mediation in lavaan
lavaan uses labels to name individual paths, which can then be combined using the := operator to define new parameters such as indirect and total effects. Labels are assigned by prefixing a path coefficient with a name followed by *:
Code
mediation_model <-' # --- Measurement model --- ANX =~ anx1 + anx2 + anx3 EFF =~ eff1 + eff2 + eff3 MOT =~ mot1 + mot2 + mot3 # --- Structural paths (labelled for mediation) --- MOT ~ a * EFF # path a: EFF -> MOT writing_score ~ b * MOT # path b: MOT -> writing_score writing_score ~ c * EFF + ANX # path c: direct EFF -> writing_score # --- Defined parameters --- indirect := a * b # indirect effect of EFF via MOT total := c + (a * b) # total effect of EFF on writing_score'
Bootstrapped confidence intervals for indirect effects
Indirect effects are the product of two path coefficients (a × b). Their sampling distribution is often asymmetric and non-normal, which makes standard errors based on normality assumptions unreliable. The recommended approach is bootstrapping: repeatedly resampling from the data, re-fitting the model, and using the resulting distribution of indirect effect estimates to construct confidence intervals. If the 95% bootstrapped CI does not contain zero, the indirect effect is statistically significant (Fuoli 2022; Kline 2023).
To interpret mediation, we examine: (1) Path a (EFF → MOT): Is Self-Efficacy a significant predictor of Motivation? (2) Path b (MOT → writing_score): Is Motivation a significant predictor of Writing Score (controlling for other predictors)? (3) Indirect effect (a × b): Is the product significant, as indicated by a 95% CI that excludes zero? (4) Direct effect c (EFF → writing_score): Does Self-Efficacy still predict Writing Score after accounting for the mediation?
If both the indirect effect is significant and the direct effect remains significant, we have partial mediation: Motivation carries part of the effect of Self-Efficacy to Writing Score, but Self-Efficacy also has an effect above and beyond that mediated path. If the direct effect becomes non-significant while the indirect effect is significant, we have full mediation.
A note on causal language
Mediation analysis is often discussed in causal terms (“X causes Y through M”). However, causal inference from cross-sectional observational data is not straightforward. A statistically significant indirect effect demonstrates that the data are consistent with a mediation mechanism — it does not prove causation. To make stronger causal claims, researchers need longitudinal designs, experimental manipulation of the mediator, or other causal identification strategies (Kline 2023).
Exercises: Mediation
Q1. What is the indirect effect in a mediation model?
Q2. Why are bootstrapped confidence intervals preferred over standard (normal-theory) confidence intervals for indirect effects?
Model Comparison and Modification
Section Overview
What you will learn: How to compare alternative SEM specifications using formal tests and fit indices, and how to use modification indices responsibly.
In practice, researchers often have competing theoretical models — alternative specifications that make different predictions about which paths should be present or absent. SEM provides tools for formally comparing such models. Two situations arise:
Nested models: Model A is a special case of Model B (Model A is Model B with one or more paths fixed to zero). These can be compared with a chi-square difference test (Δχ²).
Non-nested models: Neither model is a special case of the other. These are compared using information criteria (AIC, BIC): lower values indicate better fit, penalised for model complexity.
Comparing a constrained model
Suppose a reviewer argues that the direct path from Self-Efficacy to Writing Score is unnecessary and that all of Self-Efficacy’s influence on Writing Score is mediated through Motivation. We test this by fitting a constrained model with the direct EFF → writing_score path removed:
A significant Δχ² (p < .05) indicates that the constrained model fits significantly worse — that is, removing the direct path causes a significant deterioration in fit, providing evidence that the direct path contributes meaningfully and should be retained.
Code
data.frame(Model =c("Full model (with direct EFF path)","Constrained model (no direct EFF path)"),AIC =round(c(AIC(sem_fit), AIC(constrained_fit)), 1),BIC =round(c(BIC(sem_fit), BIC(constrained_fit)), 1)) |>flextable() |> flextable::set_table_properties(width = .80, layout ="autofit") |> flextable::theme_zebra() |> flextable::fontsize(size =11) |> flextable::fontsize(size =11, part ="header") |> flextable::align_text_col(align ="left") |> flextable::set_caption(caption ="Model comparison: AIC and BIC for the full and constrained models.") |> flextable::border_outer()
Model
AIC
BIC
Full model (with direct EFF path)
8,883.3
8,972.2
Constrained model (no direct EFF path)
9,003.1
9,088.3
The preferred model has the lower AIC (and lower BIC). A difference of more than 10 in BIC is generally considered strong evidence in favour of the model with the lower value.
Modification indices
If a model fits poorly, modification indices (MIs) can help diagnose which additional paths or covariances would most improve fit. Each MI indicates how much the overall model χ² would decrease if a particular currently-fixed parameter were freed.
Modification indices are a double-edged sword. They are useful for diagnosing systematic misfit (e.g., correlated residuals between items that share method variance). However, acting on every high MI and re-fitting the model is a form of capitalising on chance: the revised model will fit the current sample better but may not generalise.
Theory first: only free a parameter if there is a substantive, theoretically defensible reason to do so.
One at a time: modify one parameter, re-fit, re-inspect — do not free multiple parameters simultaneously.
Cross-validate: if sample size permits, split the data and use one half to explore modifications and the other to confirm them.
Report transparently: if modifications were made post-hoc, report this explicitly and distinguish the revised model from the originally hypothesised model.
Exercises: Model Comparison
Q1. What does a significant chi-square difference test (Δχ²) between two nested models indicate?
Q2. A modification index of 24.5 suggests adding a cross-loading of anx2 onto the EFF factor. Should you add this path?
Reporting Standards
Section Overview
What you will learn: What to report in an SEM study, model reporting paragraph templates, a workflow summary table, and a reporting checklist.
Reporting SEM results clearly and completely is as important as the analysis itself.
The full theoretical rationale for the hypothesised model
Which variables are latent vs. observed; which indicators load onto which factors
Software and estimator used (e.g., “Models were estimated in R using the lavaan package (Rosseel 2012) with Maximum Likelihood estimation”)
Measurement model (CFA)
Standardised factor loadings for all indicators (with SEs and significance)
All model fit indices: χ²(df), CFI, TLI, RMSEA (with 90% CI), SRMR
Scale reliabilities (McDonald’s ω or Cronbach’s α)
Structural model
Standardised path coefficients (with SEs and significance)
R² for all endogenous variables
Model fit indices
Mediation (if applicable)
Labelled paths (a, b, c/c’), indirect effect, total effect
Bootstrapped confidence intervals (state number of resamples)
Whether partial or full mediation was found
Model comparisons (if applicable)
Δχ², Δdf, p-value for nested comparisons
AIC/BIC for non-nested comparisons
Model reporting paragraphs
CFA
A three-factor measurement model was specified a priori based on the theoretical framework, with Language Anxiety (ANX), Writing Self-Efficacy (EFF), and Motivation (MOT) each indicated by three Likert-scale items (nine indicators in total). The model was estimated using Maximum Likelihood in R (lavaan; Rosseel (2012)). Model fit was excellent: χ²(df) = X.XX, CFI = .97, TLI = .96, RMSEA = .04 [90% CI: .01, .07], SRMR = .04. All standardised factor loadings were significant and exceeded 0.70 (range: .71–.82), and McDonald’s ω exceeded .80 for all three scales, indicating good reliability. The measurement model was retained for subsequent structural analysis.
Full SEM
The structural model specified directional effects of Language Anxiety and Writing Self-Efficacy on Writing Score, and an effect of Self-Efficacy on Motivation. Model fit was acceptable: χ²(df) = X.XX, CFI = .96, TLI = .95, RMSEA = .05 [90% CI: .02, .07], SRMR = .05. Writing Self-Efficacy was a significant positive predictor of both Motivation (β = .55, SE = .07, p < .001) and Writing Score (β = .47, SE = .08, p < .001). Language Anxiety was a significant negative predictor of Writing Score (β = −.38, SE = .07, p < .001). Motivation significantly predicted Writing Score (β = .21, SE = .07, p = .003). Together, the predictors explained 58% of the variance in Writing Score.
Mediation
To test whether the effect of Writing Self-Efficacy on Writing Score was partially mediated by Motivation, we re-estimated the model with labelled paths and requested 1000 bootstrap resamples for inference on the indirect effect (Fuoli 2022). The indirect effect of Self-Efficacy on Writing Score via Motivation was significant (unstandardised b = X.XX, 95% BCa CI [X.XX, X.XX]), indicating that part of the positive effect of self-efficacy on writing performance operates through increased motivation. The direct effect of Self-Efficacy on Writing Score remained significant after accounting for this indirect path, supporting partial mediation.
Quick reference: SEM workflow
Step
Action
Key R function(s)
1. Theoretical specification
Draw path diagram; specify which indicators load onto which factors and which structural paths are hypothesised
—
2. Descriptive checks
Examine distributions (skewness, kurtosis), correlations; check for multivariate outliers
psych::describe(); cor()
3. Confirmatory Factor Analysis
Fit measurement model with lavaan::cfa()
lavaan::cfa()
4. Evaluate measurement fit
Inspect CFI, TLI, RMSEA, SRMR against recommended thresholds
lavaan::fitMeasures()
5. Assess reliability
Compute McDonald's omega with semTools::reliability()
semTools::reliability()
6. Full SEM
Add structural paths; fit with lavaan::sem()
lavaan::sem()
7. Mediation (if applicable)
Label paths; define indirect/total effects with ':='; use se = 'bootstrap'
lavaan::sem(se = 'bootstrap')
8. Model comparison
Use lavTestLRT() for nested models; AIC/BIC for non-nested; consult MIs with theory
lavaan::lavTestLRT(); AIC(); modindices()
9. Report
Report all fit indices, standardised loadings, path coefficients, R2, and effect CIs
Martin Schweinberger. 2026. Structural Equation Modelling in R. The Language Technology and Data Analysis Laboratory (LADAL), The University of Queensland, Australia. url: https://ladal.edu.au/tutorials/sem/sem.html (Version 2026.03.28), doi: .
@manual{martinschweinberger2026structural,
author = {Martin Schweinberger},
title = {Structural Equation Modelling in R},
year = {2026},
note = {https://ladal.edu.au/tutorials/sem/sem.html},
organization = {The Language Technology and Data Analysis Laboratory (LADAL), The University of Queensland, Australia},
edition = {2026.03.28}
doi = {}
}
This tutorial was re-developed with the assistance of Claude (claude.ai), a large language model created by Anthropic. Claude was used to help revise the tutorial text, structure the instructional content, generate the R code examples, and write the checkdown quiz questions and feedback strings. All content was reviewed, edited, and approved by the author (Martin Schweinberger), who takes full responsibility for the accuracy and pedagogical appropriateness of the material. The use of AI assistance is disclosed here in the interest of transparency and in accordance with emerging best practices for AI-assisted academic content creation.
Anderson, James C., and David W. Gerbing. 1988. “Structural Equation Modeling in Practice: A Review and Recommended Two-Step Approach.”Psychological Bulletin 103 (3): 411–23.
Fuoli, Matteo. 2022. “Structural Equation Modeling in r: A Practical Introduction for Linguists.”Data Analytics Cogn. Linguistics: Methods Insights 41.
Hair, Joseph F., William C. Black, Barry J. Babin, and Rolph E. Anderson. 2019. Multivariate Data Analysis. 8th ed. Andover: Cengage.
Hu, Li-tze, and Peter M. Bentler. 1999. “Cutoff Criteria for Fit Indexes in Covariance Structure Analysis: Conventional Criteria Versus New Alternatives.”Structural Equation Modeling: A Multidisciplinary Journal 6 (1): 1–55.
Jackson, Dennis L., J. Arthur Gillaspy, and Rebecca Purc-Stephenson. 2009. “Reporting Practices in Confirmatory Factor Analysis: An Overview and Some Recommendations.”Psychological Methods 14 (1): 6–23.
Kline, Rex B. 2023. Principles and Practice of Structural Equation Modeling. 5th ed. New York: Guilford Press.
Larsson, Tove, Luke Plonsky, and Gregory R Hancock. 2021. “On the Benefits of Structural Equation Modeling for Corpus Linguists.”Corpus Linguistics and Linguistic Theory 17 (3): 683–714.
McDonald, Roderick P. 1999. Test Theory: A Unified Treatment. Mahwah, NJ: Lawrence Erlbaum Associates.
Nunnally, Jum C. 1978. Psychometric Theory. 2nd ed. New York: McGraw-Hill.
Rosseel, Yves. 2012. “Lavaan: An r Package for Structural Equation Modeling.”Journal of Statistical Software 48: 1–36.
Source Code
---title: "Structural Equation Modelling in R"author: "Martin Schweinberger"date: "2026"params: title: "Structural Equation Modelling in R" author: "Martin Schweinberger" year: "2026" version: "2026.03.28" url: "https://ladal.edu.au/tutorials/sem/sem.html" institution: "The Language Technology and Data Analysis Laboratory (LADAL), The University of Queensland, Australia" description: "This tutorial introduces structural equation modelling (SEM) in R using lavaan, covering confirmatory factor analysis, path diagrams, model specification, global fit indices (CFI, RMSEA, SRMR), and model comparison using AIC and BIC. It is aimed at researchers in psycholinguistics, applied linguistics, and the social sciences who need to model complex relationships among multiple variables simultaneously." doi: "10.5281/zenodo.19332953"format: html: toc: true toc-depth: 4 code-fold: show code-tools: true theme: cosmo---```{r setup, echo=FALSE, message=FALSE, warning=FALSE}library(checkdown)library(dplyr)library(ggplot2)library(tidyr)library(flextable)library(lavaan)library(semPlot)library(semTools)library(psych)options(stringsAsFactors = FALSE)options("scipen" = 100, "digits" = 12)```{ width=100% }# Introduction {#intro}{ width=15% style="float:right; padding:10px" }This tutorial introduces **Structural Equation Modelling (SEM)** — a powerful and flexible family of multivariate statistical techniques that allows researchers to simultaneously model multiple relationships among variables, account for measurement error, and test theories about constructs that cannot be directly observed. Where [simple linear regression](/tutorials/regression/regression.html) models a single outcome from one or more predictors, SEM can model entire systems of relationships, including situations where the same variable acts as both a predictor and an outcome, and where some of the most important variables in a theory are not directly measurable at all.SEM is particularly well suited to the language sciences. Much of what linguists and applied linguists care about — *language anxiety*, *motivation*, *metalinguistic awareness*, *communicative competence*, *reading ability* — cannot be captured in a single measurement. These are **latent constructs**: theoretical entities that we infer indirectly from a set of observable indicators such as questionnaire items or test scores. SEM provides a principled framework for doing exactly this, and then for examining how these latent constructs relate to one another and to observable outcomes.SEM is increasingly recognised as a valuable tool in corpus linguistics and cognitive linguistics. @larsson2021sem make the case that path models — a fundamental building block of SEM — are well suited to the multivariate nature of corpus-linguistic data, enabling researchers to move beyond monofactorial analyses and test theoretically motivated causal structures. @fuoli2022sem provides a step-by-step introduction to SEM in R for linguists working in a cognitive-linguistic framework, demonstrating its utility for modelling the psychological effects of linguistic choices. @rosseel2012lavaan's `lavaan` package, which we use throughout this tutorial, has made full-featured SEM freely available in R.This tutorial is aimed at **beginners with no prior exposure to SEM**. You do not need to have studied factor analysis or path analysis before, though familiarity with basic regression is helpful. The goal is to build conceptual understanding from the ground up and to equip you with the practical skills to fit, evaluate, and report SEM models in R.::: {.callout-note}## Learning ObjectivesBy the end of this tutorial you will be able to:1. Explain the distinction between observed and latent variables and describe why measurement error matters2. Identify the two building blocks of a full SEM — the measurement model and the structural model — and describe what each specifies3. Read and interpret a standard SEM path diagram4. Specify a Confirmatory Factor Analysis (CFA) in `lavaan` model syntax5. Evaluate a CFA using model fit indices (CFI, TLI, RMSEA, SRMR) and reliability coefficients (McDonald's ω)6. Extend a measurement model to a full SEM by adding structural paths7. Interpret standardised path coefficients and R² values from a full SEM8. Test mediation hypotheses using labelled paths and bootstrapped confidence intervals9. Compare nested and non-nested SEM specifications using Δχ², AIC, and BIC10. Use modification indices responsibly to diagnose model misfit11. Report SEM results in accordance with current best-practice conventions in linguistics and applied linguistics:::::: {.callout-note}## Prerequisite TutorialsBefore working through this tutorial, we recommend familiarity with the following:- [Introduction to Quantitative Reasoning](/tutorials/introquant/introquant.html)- [Basic Concepts in Quantitative Research](/tutorials/basicquant/basicquant.html)- [Descriptive Statistics](/tutorials/dstats/dstats.html)- [Basic Inferential Statistics](/tutorials/basicstatz/basicstatz.html)- [Simple and Multiple Linear Regression](/tutorials/regression/regression.html)- [Getting started with R](/tutorials/intror/intror.html)- [Loading, saving, and generating data in R](/tutorials/load/load.html):::::: {.callout-note}## Citation```{r citation-callout-top, echo=FALSE, results='asis'}cat( params$author, ". ", params$year, ". *", params$title, "*. ", params$institution, ". ", "url: ", params$url, " ", "(Version ", params$version, ").", sep = "")```:::---## Preparation and Session Set-up {-}Install required packages once:```{r prep1, echo=TRUE, eval=FALSE, message=FALSE, warning=FALSE}install.packages("lavaan")install.packages("semPlot")install.packages("semTools")install.packages("psych")install.packages("dplyr")install.packages("ggplot2")install.packages("tidyr")install.packages("flextable")install.packages("checkdown")```Load packages for this session:```{r load-packages, message=FALSE, warning=FALSE}library(lavaan) # SEM and CFA estimationlibrary(semPlot) # path diagram visualisationlibrary(semTools) # reliability and model comparison toolslibrary(psych) # descriptive statistics and correlation matriceslibrary(dplyr) # data manipulationlibrary(ggplot2) # data visualisationlibrary(tidyr) # data reshapinglibrary(flextable) # formatted tableslibrary(checkdown) # interactive quiz questions```---## The Dataset {-}Throughout this tutorial we use a **simulated dataset** inspired by research on second-language (L2) writing. The data represent 300 university students who completed a battery of questionnaire scales and an academic writing task. The dataset includes:- **Language Anxiety** (`anx1`–`anx3`): three Likert-scale items measuring the degree to which students feel anxious when writing in their L2 (higher = more anxious)- **Writing Self-Efficacy** (`eff1`–`eff3`): three items measuring students' confidence in their L2 writing ability (higher = greater self-efficacy)- **Motivation** (`mot1`–`mot3`): three items measuring students' intrinsic motivation to improve their L2 writing (higher = more motivated)- **Writing Score** (`writing_score`): a holistic score (0–100) assigned by trained raters to an in-class academic writing taskBecause the data are simulated in R, no external file is needed — you can reproduce the entire analysis from the code below.```{r simulate-data, message=FALSE, warning=FALSE}set.seed(42)n <- 300# Latent variable scoresanxiety <- rnorm(n, 0, 1)efficacy <- rnorm(n, 0, 1)motivat <- 0.55 * efficacy + rnorm(n, 0, sqrt(1 - 0.55^2))# Observed indicators (loading * latent + unique error)anx1 <- 0.78 * anxiety + rnorm(n, 0, sqrt(1 - 0.78^2))anx2 <- 0.72 * anxiety + rnorm(n, 0, sqrt(1 - 0.72^2))anx3 <- 0.80 * anxiety + rnorm(n, 0, sqrt(1 - 0.80^2))eff1 <- 0.80 * efficacy + rnorm(n, 0, sqrt(1 - 0.80^2))eff2 <- 0.74 * efficacy + rnorm(n, 0, sqrt(1 - 0.74^2))eff3 <- 0.77 * efficacy + rnorm(n, 0, sqrt(1 - 0.77^2))mot1 <- 0.73 * motivat + rnorm(n, 0, sqrt(1 - 0.73^2))mot2 <- 0.76 * motivat + rnorm(n, 0, sqrt(1 - 0.76^2))mot3 <- 0.70 * motivat + rnorm(n, 0, sqrt(1 - 0.70^2))# Outcome: Writing Scorewriting_score <- 55 + 10 * efficacy - 6 * anxiety + 4 * motivat + rnorm(n, 0, 6)writing_score <- round(pmin(pmax(writing_score, 10), 100))# Assemble data framesemdata <- data.frame(anx1, anx2, anx3, eff1, eff2, eff3, mot1, mot2, mot3, writing_score)``````{r view-data, echo=FALSE, message=FALSE, warning=FALSE}semdata |> head(10) |> flextable() |> flextable::set_table_properties(width = .99, layout = "autofit") |> flextable::theme_zebra() |> flextable::fontsize(size = 11) |> flextable::fontsize(size = 11, part = "header") |> flextable::align_text_col(align = "center") |> flextable::set_caption(caption = "First ten rows of the simulated L2 writing dataset (n = 300).") |> flextable::border_outer()```---# Conceptual Foundations {#concepts}::: {.callout-note}## Section Overview**What you will learn:** The core ideas underpinning SEM — latent variables, measurement error, path diagrams, and the two-component structure of a full SEM.**Why it matters:** SEM notation and vocabulary are quite different from ordinary regression. Building a solid conceptual foundation before fitting models prevents common misinterpretations.:::## Observed vs. latent variables {-}A fundamental distinction in SEM is between variables you can observe directly and those you cannot.**Observed (manifest) variables** are things you actually measure and record: a Likert-scale item, a test score, a reaction time, a corpus frequency count. They appear as columns in your dataset.**Latent variables** are theoretical constructs that you cannot measure directly. *Language anxiety*, *motivation*, and *writing self-efficacy* are classic examples from applied linguistics. No single questionnaire item perfectly captures any of these constructs — each item is merely a *fallible indicator*. Latent variables are never columns in your dataset; instead, they are modelled as common causes of their observed indicators.This distinction matters because **measurement error is unavoidable** whenever we use observed items to represent theoretical constructs. If we ignore this error — for example, by averaging questionnaire items and treating the result as if it were the true construct — we introduce **attenuation bias** into our estimates of relationships. SEM addresses this explicitly: it partitions the variance in each observed indicator into a part explained by the underlying latent variable and a part attributed to unique error (random noise plus any systematic variance not shared with the other indicators). As @larsson2021sem note, treating latent variables such as motivation and proficiency as observed (e.g., by using composite scores) leads to underestimation of relationships, which is one of the key arguments for using SEM in language research.## The two building blocks of SEM {-}A full structural equation model is composed of two sub-models:| Sub-model | Technical name | What it specifies ||---|---|---|| **Measurement model** | Confirmatory Factor Analysis (CFA) | Which observed items are indicators of which latent variables; how strongly each item loads onto its construct; how much unique error each item has || **Structural model** | Path model | Directional relationships among latent variables (and between latent variables and observed outcomes); regression-like paths encoding theoretical predictions |In the standard **two-step approach** to SEM [@anderson1988sem], researchers first establish an adequate measurement model (Step 1) before testing the structural paths of theoretical interest (Step 2). This tutorial follows this workflow: we build and evaluate a CFA in [Section 3](#cfa) and then add structural paths in [Section 5](#fullsem).## Path diagrams {-}SEM models are almost always communicated visually through **path diagrams**. The notation is standardised:| Symbol | Represents ||---|---|| **Rectangle** | Observed (manifest) variable || **Oval / ellipse** | Latent variable || **Single-headed arrow** (→) | Directional path (a regression-type effect) || **Double-headed curved arrow** (↔) | Covariance or correlation || **Small arrow into rectangle** | Residual / unique error for that indicator || **Small arrow into oval** | Disturbance (residual error for an endogenous latent variable) |In a **measurement model**, ovals point to rectangles: the latent construct is hypothesised to *cause* variation in its observed indicators. In a **structural model**, ovals point to other ovals, encoding directional theoretical predictions among constructs.::: {.callout-important}## SEM is a confirmatory, theory-driven methodUnlike Exploratory Factor Analysis (EFA), which discovers factor structure empirically from the data, SEM requires the researcher to **specify the model in advance** based on theory. Every path in the diagram — every arrow that is included or excluded — reflects a theoretical decision. A good model fit indicates that the specified model is *consistent* with the data; it does not prove the model is the only correct one. Alternative models that fit equally well are always possible (this is the **problem of equivalent models**). Always ground your SEM specifications in theory, not post-hoc data exploration [@kline2023principles].:::## A conceptual map of our example {-}Our theoretical model for the L2 writing dataset can be described as follows:1. **Language Anxiety**, **Writing Self-Efficacy**, and **Motivation** are latent constructs, each measured by three questionnaire items.2. We expect Self-Efficacy and Anxiety to have opposite effects on **Writing Score**: greater self-efficacy should improve performance; greater anxiety should impair it.3. Self-Efficacy is also expected to influence Motivation (students who feel more capable tend to be more motivated), and Motivation may in turn have a positive effect on Writing Score. This indirect path constitutes a **mediation** hypothesis.This conceptual model drives all the analytic choices that follow.---# Descriptive Statistics and Correlations {#descriptives}::: {.callout-note}## Section Overview**What you will learn:** How to examine the observed variables before fitting any model.**Key steps:** Descriptive statistics, distribution checks, inter-item correlations.:::Before fitting any SEM, it is good practice to examine the distributions and inter-relationships of your observed variables. Severe non-normality or implausible correlations can signal problems that need to be addressed before modelling.## Descriptive statistics {-}```{r desc01, message=FALSE, warning=FALSE}psych::describe(semdata) |> dplyr::select(n, mean, sd, median, skew, kurtosis, min, max) |> round(3) |> tibble::rownames_to_column("Variable") |> flextable() |> flextable::set_table_properties(width = .99, layout = "autofit") |> flextable::theme_zebra() |> flextable::fontsize(size = 11) |> flextable::fontsize(size = 11, part = "header") |> flextable::align_text_col(align = "left") |> flextable::set_caption(caption = "Descriptive statistics for all observed variables.") |> flextable::border_outer()```All items are centred near zero (as expected for standardised simulated data). Skewness values are within the acceptable range of [−1, +1] for all items, meaning that the normality assumption required for maximum likelihood estimation in `lavaan`[@rosseel2012lavaan] is not substantially violated.## Correlation matrix {-}A correlation matrix helps us verify that items within the same scale correlate with each other (convergent evidence) and that items from different scales correlate less strongly (discriminant evidence).```{r corr01, message=FALSE, warning=FALSE}cor_mat <- cor(semdata |> dplyr::select(-writing_score)) |> round(2)cor_mat |> as.data.frame() |> tibble::rownames_to_column("Variable") |> flextable() |> flextable::set_table_properties(width = .99, layout = "autofit") |> flextable::theme_zebra() |> flextable::fontsize(size = 10) |> flextable::fontsize(size = 10, part = "header") |> flextable::align_text_col(align = "center") |> flextable::set_caption(caption = "Pearson correlation matrix for the nine questionnaire items.") |> flextable::border_outer()``````{r corr-viz, message=FALSE, warning=FALSE}cor_long <- cor_mat |> as.data.frame() |> tibble::rownames_to_column("Var1") |> tidyr::pivot_longer(-Var1, names_to = "Var2", values_to = "r")ggplot(cor_long, aes(x = Var1, y = Var2, fill = r)) + geom_tile(color = "white") + geom_text(aes(label = round(r, 2)), size = 3.2) + scale_fill_gradient2(low = "tomato", mid = "white", high = "steelblue", midpoint = 0, limits = c(-1, 1), name = "r") + theme_bw() + theme(axis.text.x = element_text(angle = 45, hjust = 1), panel.grid = element_blank()) + labs(title = "Correlation heatmap: nine questionnaire items", x = "", y = "")```The heatmap confirms the expected pattern: items within each scale (e.g., `anx1`–`anx3`) correlate strongly with each other and more weakly with items from the other scales. The efficacy and motivation items show moderate cross-scale correlations, consistent with our theoretical expectation that the two constructs are related.---# Confirmatory Factor Analysis (CFA) {#cfa}::: {.callout-note}## Section Overview**What you will learn:** How to specify, fit, and evaluate a measurement model using CFA in `lavaan`.**Key concepts:** Factor loadings, model fit indices, reliability, convergent and discriminant validity.**Why CFA before SEM:** The measurement model must be established before structural paths are meaningful. If your indicators do not adequately reflect the intended latent constructs, the structural estimates will be uninterpretable.:::## What is Confirmatory Factor Analysis? {-}**Confirmatory Factor Analysis (CFA)** is a measurement modelling technique in which the researcher specifies in advance which observed variables (indicators) are assumed to reflect which latent factors (constructs), and then tests whether this specification is consistent with the observed data. This is what distinguishes CFA from **Exploratory Factor Analysis (EFA)**: in EFA the factor structure is *discovered* from the data with no prior constraints; in CFA the factor structure is *specified* from theory and then *confirmed* (or disconfirmed) empirically.In our example, we hypothesise three latent factors:- **Anxiety** (*ANX*), indicated by `anx1`, `anx2`, `anx3`- **Self-Efficacy** (*EFF*), indicated by `eff1`, `eff2`, `eff3`- **Motivation** (*MOT*), indicated by `mot1`, `mot2`, `mot3`## Specifying a CFA model in `lavaan` {-}The `lavaan` package [@rosseel2012lavaan] uses a simple, readable model syntax. The key operator for defining a measurement model is `=~` which is read as *"is measured by"* or *"is indicated by"*:```LatentVariable =~ indicator1 + indicator2 + indicator3```We specify our three-factor measurement model as follows:```{r cfa-spec, message=FALSE, warning=FALSE}cfa_model <- ' # Measurement model ANX =~ anx1 + anx2 + anx3 EFF =~ eff1 + eff2 + eff3 MOT =~ mot1 + mot2 + mot3'```::: {.callout-note}## `lavaan` model syntax at a glance| Operator | Meaning | Example ||---|---|---|| `=~` | Measured by (latent → indicator) | `ANX =~ anx1 + anx2` || `~` | Regressed on (structural path) | `MOT ~ EFF` || `~~` | Correlated with (covariance) | `ANX ~~ EFF` || `~1` | Intercept / mean | `anx1 ~ 1` |By default, `lavaan` fixes the first indicator loading to 1.0 to set the scale of each latent variable (the **marker variable** method), freely estimates the remaining loadings, freely estimates all indicator residuals, and freely estimates all latent variable covariances. You can change these defaults using arguments to `cfa()` or `sem()`.:::## Fitting the CFA model {-}We fit the model using `lavaan::cfa()`. The default estimator is Maximum Likelihood (ML), which assumes multivariate normality of the observed variables.```{r cfa-fit, message=FALSE, warning=FALSE}cfa_fit <- lavaan::cfa(cfa_model, data = semdata, estimator = "ML")summary(cfa_fit, fit.measures = TRUE, standardized = TRUE)```This output contains three major sections: **model fit information**, **factor loadings** (both unstandardised and standardised), and **latent variable covariances**.## Interpreting factor loadings {-}Factor loadings express how strongly each indicator is related to its underlying latent variable. In the **standardised solution** (column `Std.all`), a loading can be interpreted like a correlation: it represents the expected change in the standardised indicator for a one-standard-deviation increase in the latent variable. Standardised loadings above **0.50** are generally considered acceptable; loadings above **0.70** are considered strong [@hair2019multivariate].```{r cfa-loadings, message=FALSE, warning=FALSE}loadings_df <- lavaan::standardizedsolution(cfa_fit) |> dplyr::filter(op == "=~") |> dplyr::select(Latent = lhs, Indicator = rhs, Std_Loading = est.std, SE = se, z = z, p = pvalue) |> dplyr::mutate(across(where(is.numeric), ~round(.x, 3)))loadings_df |> flextable() |> flextable::set_table_properties(width = .85, layout = "autofit") |> flextable::theme_zebra() |> flextable::fontsize(size = 11) |> flextable::fontsize(size = 11, part = "header") |> flextable::align_text_col(align = "left") |> flextable::set_caption(caption = "Standardised CFA factor loadings with standard errors and significance tests.") |> flextable::border_outer()```All standardised loadings should exceed 0.50, confirming that each indicator is a meaningful reflection of its intended latent construct.## Model fit assessment {-}Fitting a CFA does not automatically produce a good model. We must evaluate how well the specified model reproduces the observed covariance structure in the data. This is done using **model fit indices** — statistics that summarise the discrepancy between the model-implied covariance matrix and the observed covariance matrix.::: {.callout-important}## Model fit indices: what they mean and which cut-offs to useNo single fit index is sufficient. Report a combination of the following:| Index | Full name | What it measures | Acceptable | Good ||---|---|---|---|---|| **χ²** | Chi-square test | Overall model misfit (sensitive to N) | *p* > .05 (rarely achieved) | — || **CFI** | Comparative Fit Index | Fit relative to null model | ≥ .90 | ≥ .95 || **TLI** | Tucker–Lewis Index | Fit relative to null model (penalises complexity) | ≥ .90 | ≥ .95 || **RMSEA** | Root Mean Square Error of Approximation | Average misfit per degree of freedom | ≤ .08 | ≤ .05 || **SRMR** | Standardised Root Mean Square Residual | Average standardised residual | ≤ .08 | ≤ .05 |Cut-offs are from @hu1999cutoff. These are guidelines, not hard thresholds — model fit must always be evaluated in the context of model complexity and sample size [@kline2023principles].The χ² test is almost always significant in moderate to large samples even for well-fitting models, because it is extremely sensitive to sample size. It is therefore standard practice to rely on the incremental and approximate fit indices (CFI, TLI, RMSEA, SRMR) rather than on χ² alone [@fuoli2022sem].:::```{r cfa-fitindices, message=FALSE, warning=FALSE}fit_indices <- lavaan::fitMeasures(cfa_fit, c("chisq", "df", "pvalue", "cfi", "tli", "rmsea", "rmsea.ci.lower", "rmsea.ci.upper", "srmr")) |> round(3)data.frame( Index = c("chi-square", "df", "p (chi-square)", "CFI", "TLI", "RMSEA", "RMSEA 90% CI lower", "RMSEA 90% CI upper", "SRMR"), Value = as.numeric(fit_indices), Threshold = c("—", "—", "> .05", ">= .95", ">= .95", "<= .05", "—", "—", "<= .05")) |> flextable() |> flextable::set_table_properties(width = .70, layout = "autofit") |> flextable::theme_zebra() |> flextable::fontsize(size = 11) |> flextable::fontsize(size = 11, part = "header") |> flextable::align_text_col(align = "left") |> flextable::set_caption(caption = "CFA model fit indices with recommended thresholds (Hu & Bentler, 1999).") |> flextable::border_outer()```## Internal consistency reliability {-}Beyond model fit, we assess whether each scale is internally consistent — that is, whether the indicators of each latent variable reliably hang together. We use **McDonald's omega (ω)**, which is the preferred reliability coefficient for factor-based scales because, unlike Cronbach's alpha, it does not assume equal factor loadings [@mcdonald1999test].```{r reliability, message=FALSE, warning=FALSE}rel <- semTools::reliability(cfa_fit)data.frame( Scale = c("ANX (Language Anxiety)", "EFF (Writing Self-Efficacy)", "MOT (Motivation)"), Omega = round(as.numeric(rel["omega", ]), 3), Alpha = round(as.numeric(rel["alpha", ]), 3)) |> flextable() |> flextable::set_table_properties(width = .70, layout = "autofit") |> flextable::theme_zebra() |> flextable::fontsize(size = 11) |> flextable::fontsize(size = 11, part = "header") |> flextable::align_text_col(align = "left") |> flextable::set_caption(caption = "McDonald's omega and Cronbach's alpha for each scale.") |> flextable::border_outer()```Values of ω ≥ .70 are generally considered acceptable for research purposes; ω ≥ .80 is considered good [@nunnally1978psychometric].## Visualising the measurement model {-}The `semPlot` package produces path diagrams directly from a fitted `lavaan` object.```{r cfa-plot, message=FALSE, warning=FALSE, fig.width=9, fig.height=6}semPlot::semPaths( cfa_fit, what = "std", layout = "tree", rotation = 2, edge.label.cex = 0.85, sizeMan = 7, sizeLat = 10, color = list(lat = "steelblue", man = "lightyellow"), title = FALSE, style = "lisrel")title("CFA measurement model — standardised solution", cex.main = 1)```Each oval represents a latent variable; each rectangle an observed indicator. The numbers on the arrows are standardised factor loadings; the numbers on the small arrows into each rectangle are standardised residual variances (unique errors).---::: {.callout-tip}## Exercises: CFA:::**Q1. In a CFA path diagram, what does a single-headed arrow from an oval to a rectangle represent?**```{r}#| echo: false#| label: "CFA_Q1"check_question("The latent variable (oval) is hypothesised to cause variation in the observed indicator (rectangle)",options =c("The latent variable (oval) is hypothesised to cause variation in the observed indicator (rectangle)","The observed indicator (rectangle) causes the latent variable (oval)","The two variables are simply correlated, with no causal direction implied","The arrow indicates that the two variables share measurement error" ),type ="radio",q_id ="CFA_Q1",random_answer_order =TRUE,button_label ="Check answer",right ="Correct! In CFA (and SEM generally), latent variables are modelled as common causes of their observed indicators. The direction of causality runs from the latent oval to the observed rectangle. This is called a *reflective* measurement model — changes in the latent construct are reflected in corresponding changes in each indicator.",wrong ="Think about what a latent variable is: an unobserved construct that we assume underlies (causes) the pattern of responses on the observed items. Which direction should the arrow point?")```**Q2. A CFA model returns CFI = .88 and RMSEA = .09. What is the most appropriate conclusion?**```{r}#| echo: false#| label: "CFA_Q2"check_question("The model fit is poor — both indices fall below the recommended thresholds. The model should be inspected and potentially revised.",options =c("The model fit is poor — both indices fall below the recommended thresholds. The model should be inspected and potentially revised.","The model fits well — CFI and RMSEA are never both good at the same time","The fit is acceptable because RMSEA < .10","No conclusion can be drawn without the chi-square p-value" ),type ="radio",q_id ="CFA_Q2",random_answer_order =TRUE,button_label ="Check answer",right ="Correct! CFI = .88 falls below the commonly accepted threshold of ≥ .90 (and well below the ≥ .95 threshold for good fit). RMSEA = .09 exceeds the acceptable upper bound of ≤ .08. Both indices point to poor fit. The researcher should examine modification indices, check whether items cross-load onto the wrong factors, and consider whether the theoretical model needs revision.",wrong ="Check the recommended thresholds: CFI ≥ .90 (good: ≥ .95) and RMSEA ≤ .08 (good: ≤ .05). Do CFI = .88 and RMSEA = .09 meet these?")```**Q3. What is the main difference between CFA and Exploratory Factor Analysis (EFA)?**```{r}#| echo: false#| label: "CFA_Q3"check_question("In CFA the researcher specifies which indicators belong to which factors in advance based on theory; in EFA the factor structure is discovered empirically from the data",options =c("In CFA the researcher specifies which indicators belong to which factors in advance based on theory; in EFA the factor structure is discovered empirically from the data","CFA always produces better-fitting models than EFA","EFA requires a larger sample size than CFA","CFA is used for continuous variables; EFA is used for categorical variables" ),type ="radio",q_id ="CFA_Q3",random_answer_order =TRUE,button_label ="Check answer",right ="Correct! CFA is a confirmatory, theory-driven technique. The researcher decides in advance — based on theory and prior evidence — which observed items load onto which latent factors, and then tests whether this structure is consistent with the data. EFA makes no such prior commitments: it uses the data itself to determine how many factors are needed and which items load onto each. EFA is appropriate for scale development and initial exploration; CFA is appropriate for testing an established theoretical measurement structure.",wrong ="The key distinction is about the role of theory vs. data in determining factor structure. Which technique imposes a theoretical structure before looking at the data?")```---# Full Structural Equation Model {#fullsem}::: {.callout-note}## Section Overview**What you will learn:** How to extend a CFA measurement model by adding directional structural paths between latent variables and outcomes.**Key concepts:** Endogenous vs. exogenous variables, structural paths, disturbances, standardised path coefficients.:::Once we are satisfied with the measurement model, we add the **structural paths** — the directional hypotheses about how the latent variables relate to each other and to the writing score outcome. Our theoretical model predicts:1. **Anxiety** → **Writing Score** (negative effect: more anxious students perform worse)2. **Self-Efficacy** → **Writing Score** (positive effect)3. **Self-Efficacy** → **Motivation** (positive effect: more efficacious students are more motivated)4. **Motivation** → **Writing Score** (positive effect)Path (3) combined with path (4) constitutes an **indirect effect** of Self-Efficacy on Writing Score *through* Motivation — a mediation hypothesis examined in [Section 6](#mediation).## Specifying the full SEM {-}In `lavaan`, structural paths are specified using the `~` operator, which is read as *"is regressed on"*:```Outcome ~ Predictor```We combine the measurement model with the structural paths in a single model string:```{r sem-spec, message=FALSE, warning=FALSE}sem_model <- ' # --- Measurement model --- ANX =~ anx1 + anx2 + anx3 EFF =~ eff1 + eff2 + eff3 MOT =~ mot1 + mot2 + mot3 # --- Structural paths --- MOT ~ EFF writing_score ~ ANX + EFF + MOT'```::: {.callout-note}## Endogenous vs. exogenous variablesIn SEM terminology:- **Exogenous variables** have no incoming arrows (they are only predictors, never outcomes). In our model, *ANX* and *EFF* are exogenous latent variables.- **Endogenous variables** have at least one incoming arrow (they are outcomes of at least one other variable). *MOT* and *writing_score* are endogenous.Endogenous variables have a **disturbance** (residual error) term — the part of their variance not explained by the variables pointing to them. `lavaan` estimates disturbances automatically.:::## Fitting the full SEM {-}We fit the full SEM using `lavaan::sem()`. The syntax is identical to `cfa()` but with the full model specification:```{r sem-fit, message=FALSE, warning=FALSE}sem_fit <- lavaan::sem(sem_model, data = semdata, estimator = "ML")summary(sem_fit, fit.measures = TRUE, standardized = TRUE)```## Structural path estimates {-}```{r sem-paths, message=FALSE, warning=FALSE}sem_paths_df <- lavaan::standardizedsolution(sem_fit) |> dplyr::filter(op == "~") |> dplyr::select(Outcome = lhs, Predictor = rhs, Std_Estimate = est.std, SE = se, z = z, p = pvalue) |> dplyr::mutate( across(where(is.numeric), ~round(.x, 3)), Sig = dplyr::case_when( p < .001 ~ "***", p < .01 ~ "**", p < .05 ~ "*", TRUE ~ "" ) )sem_paths_df |> flextable() |> flextable::set_table_properties(width = .90, layout = "autofit") |> flextable::theme_zebra() |> flextable::fontsize(size = 11) |> flextable::fontsize(size = 11, part = "header") |> flextable::align_text_col(align = "left") |> flextable::set_caption(caption = "Standardised structural path coefficients from the full SEM.") |> flextable::border_outer()```Standardised path coefficients can be interpreted similarly to standardised regression coefficients (β): they indicate the expected change in the outcome (in standard deviation units) for a one-standard-deviation increase in the predictor, holding all other predictors constant.## Visualising the full SEM {-}```{r sem-plot, message=FALSE, warning=FALSE, fig.width=10, fig.height=7}semPlot::semPaths( sem_fit, what = "std", layout = "tree2", rotation = 2, edge.label.cex = 0.80, sizeMan = 6, sizeLat = 10, color = list(lat = "steelblue", man = "lightyellow"), title = FALSE, style = "lisrel", residuals = TRUE, curvePivot = TRUE)title("Full SEM — standardised solution", cex.main = 1)```## R² for endogenous variables {-}```{r sem-rsq, message=FALSE, warning=FALSE}data.frame( Variable = names(lavaan::inspect(sem_fit, "r2")), R2 = round(as.numeric(lavaan::inspect(sem_fit, "r2")), 3)) |> dplyr::filter(R2 > 0) |> flextable() |> flextable::set_table_properties(width = .45, layout = "autofit") |> flextable::theme_zebra() |> flextable::fontsize(size = 11) |> flextable::fontsize(size = 11, part = "header") |> flextable::align_text_col(align = "center") |> flextable::set_caption(caption = "Proportion of variance explained (R2) for endogenous variables.") |> flextable::border_outer()```---::: {.callout-tip}## Exercises: Full SEM:::**Q1. In the `lavaan` model syntax, what does the `~` operator specify?**```{r}#| echo: false#| label: "SEM_Q1"check_question("A directional structural path: the variable on the left is regressed on the variable on the right",options =c("A directional structural path: the variable on the left is regressed on the variable on the right","A measurement relationship: a latent variable is measured by an indicator","A covariance between two variables","An equality constraint between two parameters" ),type ="radio",q_id ="SEM_Q1",random_answer_order =TRUE,button_label ="Check answer",right ="Correct! In lavaan syntax, `~` specifies a regression path. `Y ~ X` means Y is regressed on X — X is the predictor and Y is the outcome. This is analogous to the regression formula notation in base R (e.g., `lm(Y ~ X)`). The three key operators are: `=~` for measurement (latent → indicator), `~` for regression (predictor → outcome), and `~~` for covariances.",wrong ="Think about how R's formula notation works. In `lm(Y ~ X)`, what does `~` separate? The same logic applies in lavaan.")```**Q2. A standardised structural path coefficient of β = −0.42 (p < .001) from Anxiety to Writing Score means:**```{r}#| echo: false#| label: "SEM_Q2"check_question("A one-standard-deviation increase in Anxiety is associated with a 0.42 standard deviation decrease in Writing Score, holding other variables constant",options =c("A one-standard-deviation increase in Anxiety is associated with a 0.42 standard deviation decrease in Writing Score, holding other variables constant","42% of the variance in Writing Score is explained by Anxiety","Anxiety causes Writing Score to decrease by 42 points on the raw scale","The correlation between Anxiety and Writing Score is -0.42" ),type ="radio",q_id ="SEM_Q2",random_answer_order =TRUE,button_label ="Check answer",right ="Correct! Standardised path coefficients (β) are interpreted like standardised regression coefficients: a one-SD increase in the predictor is associated with a β-SD change in the outcome, controlling for other variables in the model. The negative sign confirms the expected direction: higher anxiety is associated with lower writing performance. This is not the same as a correlation (which is bivariate), nor does it tell us the proportion of variance explained (that would be R²).",wrong ="Standardised path coefficients have the same interpretation as standardised regression coefficients (β). Think about what a standardised coefficient tells you about the relationship between two variables measured in standard deviation units.")```---# Mediation Analysis {#mediation}::: {.callout-note}## Section Overview**What you will learn:** How to test mediation hypotheses — indirect effects of one variable on another via a third — within an SEM framework.**Key concepts:** Direct effects, indirect effects, total effects, bootstrapped confidence intervals.:::## What is mediation? {-}**Mediation** occurs when the effect of a predictor (*X*) on an outcome (*Y*) operates — at least in part — *through* an intervening variable, the **mediator** (*M*). Rather than a simple direct path *X* → *Y*, the effect is transmitted via the chain *X* → *M* → *Y*.In our example, the theoretical mediation hypothesis is:> **Self-Efficacy** (*EFF*) influences **Writing Score** both directly and *indirectly* by increasing **Motivation** (*MOT*), which in turn improves **Writing Score**.This decomposes the total effect of Self-Efficacy on Writing Score into a **direct effect** (*EFF* → *writing_score*), an **indirect effect** via Motivation (*EFF* → *MOT* → *writing_score*), and the **total effect** (direct + indirect).## Specifying mediation in `lavaan` {-}`lavaan` uses **labels** to name individual paths, which can then be combined using the `:=` operator to define new parameters such as indirect and total effects. Labels are assigned by prefixing a path coefficient with a name followed by `*`:```{r med-spec, message=FALSE, warning=FALSE}mediation_model <- ' # --- Measurement model --- ANX =~ anx1 + anx2 + anx3 EFF =~ eff1 + eff2 + eff3 MOT =~ mot1 + mot2 + mot3 # --- Structural paths (labelled for mediation) --- MOT ~ a * EFF # path a: EFF -> MOT writing_score ~ b * MOT # path b: MOT -> writing_score writing_score ~ c * EFF + ANX # path c: direct EFF -> writing_score # --- Defined parameters --- indirect := a * b # indirect effect of EFF via MOT total := c + (a * b) # total effect of EFF on writing_score'```## Bootstrapped confidence intervals for indirect effects {-}Indirect effects are the *product* of two path coefficients (*a × b*). Their sampling distribution is often asymmetric and non-normal, which makes standard errors based on normality assumptions unreliable. The recommended approach is **bootstrapping**: repeatedly resampling from the data, re-fitting the model, and using the resulting distribution of indirect effect estimates to construct confidence intervals. If the 95% bootstrapped CI does not contain zero, the indirect effect is statistically significant [@fuoli2022sem; @kline2023principles].```{r med-fit, message=FALSE, warning=FALSE, cache=TRUE}set.seed(42)med_fit <- lavaan::sem(mediation_model, data = semdata, estimator = "ML", se = "bootstrap", bootstrap = 1000)med_effects <- lavaan::parameterEstimates(med_fit, boot.ci.type = "bca.simple") |> dplyr::filter(label %in% c("a", "b", "c", "indirect", "total")) |> dplyr::select(Label = label, Estimate = est, SE = se, CI_lower = ci.lower, CI_upper = ci.upper, p = pvalue) |> dplyr::mutate(across(where(is.numeric), ~round(.x, 3)))med_effects |> flextable() |> flextable::set_table_properties(width = .85, layout = "autofit") |> flextable::theme_zebra() |> flextable::fontsize(size = 11) |> flextable::fontsize(size = 11, part = "header") |> flextable::align_text_col(align = "left") |> flextable::set_caption(caption = "Direct, indirect (mediated), and total effects with bootstrapped 95% CIs (1000 resamples).") |> flextable::border_outer()```## Interpreting mediation results {-}To interpret mediation, we examine: (1) **Path *a*** (*EFF* → *MOT*): Is Self-Efficacy a significant predictor of Motivation? (2) **Path *b*** (*MOT* → *writing_score*): Is Motivation a significant predictor of Writing Score (controlling for other predictors)? (3) **Indirect effect** (*a × b*): Is the product significant, as indicated by a 95% CI that excludes zero? (4) **Direct effect *c*** (*EFF* → *writing_score*): Does Self-Efficacy still predict Writing Score after accounting for the mediation?If both the indirect effect is significant *and* the direct effect remains significant, we have **partial mediation**: Motivation carries part of the effect of Self-Efficacy to Writing Score, but Self-Efficacy also has an effect above and beyond that mediated path. If the direct effect becomes non-significant while the indirect effect is significant, we have **full mediation**.::: {.callout-note}## A note on causal languageMediation analysis is often discussed in causal terms ("X causes Y through M"). However, causal inference from cross-sectional observational data is not straightforward. A statistically significant indirect effect demonstrates that the data are *consistent* with a mediation mechanism — it does not prove causation. To make stronger causal claims, researchers need longitudinal designs, experimental manipulation of the mediator, or other causal identification strategies [@kline2023principles].:::---::: {.callout-tip}## Exercises: Mediation:::**Q1. What is the indirect effect in a mediation model?**```{r}#| echo: false#| label: "MED_Q1"check_question("The product of the path from the predictor to the mediator (a) and the path from the mediator to the outcome (b): indirect = a × b",options =c("The product of the path from the predictor to the mediator (a) and the path from the mediator to the outcome (b): indirect = a × b","The direct path from the predictor to the outcome, bypassing the mediator","The correlation between the predictor and the mediator","The total variance in the outcome explained by all predictors" ),type ="radio",q_id ="MED_Q1",random_answer_order =TRUE,button_label ="Check answer",right ="Correct! The indirect effect quantifies how much of the predictor's influence on the outcome is transmitted via the mediator. It is computed as the product of two paths: (a) the effect of the predictor on the mediator, and (b) the effect of the mediator on the outcome controlling for the predictor. If either a or b is zero, the indirect effect is zero — both links in the chain must be non-zero for mediation to occur.",wrong ="In a mediation chain X → M → Y, the indirect effect is the amount of X's influence on Y that travels through M. How would you quantify that using the two path coefficients a (X→M) and b (M→Y)?")```**Q2. Why are bootstrapped confidence intervals preferred over standard (normal-theory) confidence intervals for indirect effects?**```{r}#| echo: false#| label: "MED_Q2"check_question("Because indirect effects are products of two path coefficients, their sampling distribution is often asymmetric and non-normal — bootstrapping does not assume normality and therefore produces more accurate CIs",options =c("Because indirect effects are products of two path coefficients, their sampling distribution is often asymmetric and non-normal — bootstrapping does not assume normality and therefore produces more accurate CIs","Because bootstrapping always produces wider, more conservative confidence intervals","Because standard CIs are only valid for indirect effects with more than two paths","Bootstrapped CIs are not actually preferred — standard CIs are equally appropriate" ),type ="radio",q_id ="MED_Q2",random_answer_order =TRUE,button_label ="Check answer",right ="Correct! The indirect effect a×b is the product of two random variables. Even if a and b are each normally distributed, their product is not — it tends to be asymmetrically distributed, especially in smaller samples. Standard (Sobel-test) CIs assume normality of the sampling distribution, which leads to CIs that are too narrow in the tails. Bootstrapping resamples from the actual data and builds an empirical distribution of the indirect effect, producing CIs that correctly capture the asymmetry. This is why the bootstrapped bias-corrected-and-accelerated (BCa) CI is recommended.",wrong ="Think about the mathematical structure of the indirect effect: it is a product of two estimated path coefficients. What does this imply about the shape of its sampling distribution?")```---# Model Comparison and Modification {#modelcomp}::: {.callout-note}## Section Overview**What you will learn:** How to compare alternative SEM specifications using formal tests and fit indices, and how to use modification indices responsibly.**Key concepts:** Nested models, likelihood ratio (chi-square difference) test, AIC/BIC, modification indices.:::## Why compare models? {-}In practice, researchers often have competing theoretical models — alternative specifications that make different predictions about which paths should be present or absent. SEM provides tools for formally comparing such models. Two situations arise:1. **Nested models**: Model A is a special case of Model B (Model A is Model B with one or more paths fixed to zero). These can be compared with a **chi-square difference test (Δχ²)**.2. **Non-nested models**: Neither model is a special case of the other. These are compared using **information criteria** (AIC, BIC): lower values indicate better fit, penalised for model complexity.## Comparing a constrained model {-}Suppose a reviewer argues that the direct path from Self-Efficacy to Writing Score is unnecessary and that all of Self-Efficacy's influence on Writing Score is mediated through Motivation. We test this by fitting a **constrained model** with the direct *EFF* → *writing_score* path removed:```{r nested-mod, message=FALSE, warning=FALSE}constrained_model <- ' # --- Measurement model --- ANX =~ anx1 + anx2 + anx3 EFF =~ eff1 + eff2 + eff3 MOT =~ mot1 + mot2 + mot3 # --- Structural paths (direct EFF -> writing_score path removed) --- MOT ~ EFF writing_score ~ ANX + MOT'constrained_fit <- lavaan::sem(constrained_model, data = semdata, estimator = "ML")lavaan::lavTestLRT(constrained_fit, sem_fit)```A significant Δχ² (*p* < .05) indicates that the constrained model fits significantly worse — that is, removing the direct path causes a significant deterioration in fit, providing evidence that the direct path contributes meaningfully and should be retained.```{r aic-bic, message=FALSE, warning=FALSE}data.frame( Model = c("Full model (with direct EFF path)", "Constrained model (no direct EFF path)"), AIC = round(c(AIC(sem_fit), AIC(constrained_fit)), 1), BIC = round(c(BIC(sem_fit), BIC(constrained_fit)), 1)) |> flextable() |> flextable::set_table_properties(width = .80, layout = "autofit") |> flextable::theme_zebra() |> flextable::fontsize(size = 11) |> flextable::fontsize(size = 11, part = "header") |> flextable::align_text_col(align = "left") |> flextable::set_caption(caption = "Model comparison: AIC and BIC for the full and constrained models.") |> flextable::border_outer()```The preferred model has the **lower** AIC (and lower BIC). A difference of more than 10 in BIC is generally considered strong evidence in favour of the model with the lower value.## Modification indices {-}If a model fits poorly, **modification indices (MIs)** can help diagnose which additional paths or covariances would most improve fit. Each MI indicates how much the overall model χ² would decrease if a particular currently-fixed parameter were freed.```{r mod-indices, message=FALSE, warning=FALSE}mi <- lavaan::modindices(sem_fit, sort. = TRUE, maximum.number = 10)mi |> dplyr::select(lhs, op, rhs, mi, epc) |> dplyr::mutate(across(c(mi, epc), ~round(.x, 3))) |> dplyr::rename(LHS = lhs, Operator = op, RHS = rhs, MI = mi, `Expected Parameter Change` = epc) |> flextable() |> flextable::set_table_properties(width = .85, layout = "autofit") |> flextable::theme_zebra() |> flextable::fontsize(size = 11) |> flextable::fontsize(size = 11, part = "header") |> flextable::align_text_col(align = "left") |> flextable::set_caption(caption = "Top 10 modification indices (sorted by MI, descending). MI > 10 typically warrants attention.") |> flextable::border_outer()```::: {.callout-important}## Using modification indices responsiblyModification indices are a double-edged sword. They are useful for diagnosing *systematic* misfit (e.g., correlated residuals between items that share method variance). However, acting on every high MI and re-fitting the model is a form of **capitalising on chance**: the revised model will fit the current sample better but may not generalise.Rules of thumb for responsible use [@jackson2009reporting]:1. **Theory first**: only free a parameter if there is a substantive, theoretically defensible reason to do so.2. **One at a time**: modify one parameter, re-fit, re-inspect — do not free multiple parameters simultaneously.3. **Cross-validate**: if sample size permits, split the data and use one half to explore modifications and the other to confirm them.4. **Report transparently**: if modifications were made post-hoc, report this explicitly and distinguish the revised model from the originally hypothesised model.:::---::: {.callout-tip}## Exercises: Model Comparison:::**Q1. What does a significant chi-square difference test (Δχ²) between two nested models indicate?**```{r}#| echo: false#| label: "MC_Q1"check_question("The more constrained (simpler) model fits significantly worse than the less constrained (more complex) model — the freed parameter(s) contribute meaningfully to model fit",options =c("The more constrained (simpler) model fits significantly worse than the less constrained (more complex) model — the freed parameter(s) contribute meaningfully to model fit","The two models are equivalent in fit and either can be used","The more complex model should always be rejected in favour of parsimony","The chi-square difference test only applies to non-nested models" ),type ="radio",q_id ="MC_Q1",random_answer_order =TRUE,button_label ="Check answer",right ="Correct! When Model A is nested within Model B (A has more constraints / fewer free parameters), the delta chi-square = chi-square(A) - chi-square(B) follows a chi-square distribution with degrees of freedom equal to the difference in df between the two models. A significant result (p < .05) means the extra parameters in Model B account for a statistically significant improvement in fit — the simpler model does not fit as well. A non-significant result supports the simpler (more parsimonious) model.",wrong ="Think about what it means for a more constrained model to have a higher chi-square. A significant delta chi-square means the constraints imposed in the simpler model cause a significant deterioration in fit. What does that imply about those constraints?")```**Q2. A modification index of 24.5 suggests adding a cross-loading of `anx2` onto the EFF factor. Should you add this path?**```{r}#| echo: false#| label: "MC_Q2"check_question("Not necessarily — only if there is a substantive theoretical justification. Adding paths purely because their MI is large constitutes post-hoc model fishing and inflates Type I error.",options =c("Not necessarily — only if there is a substantive theoretical justification. Adding paths purely because their MI is large constitutes post-hoc model fishing and inflates Type I error.","Yes — any MI above 10 must be freed to achieve acceptable model fit","Yes — a larger MI always means the path is theoretically meaningful","No — modification indices should never be consulted after model fitting" ),type ="radio",q_id ="MC_Q2",random_answer_order =TRUE,button_label ="Check answer",right ="Correct! A high MI tells you that freeing a parameter would improve statistical fit, but it says nothing about whether that parameter is *theoretically meaningful*. Freeing every high-MI parameter is a form of capitalising on chance — the model becomes over-fitted to the current sample and will not generalise. Always ask: 'Is there a substantive reason why anx2 should also reflect self-efficacy?' If the answer is no, the path should not be added, regardless of the MI value.",wrong ="A high modification index means a path would improve chi-square fit — but does a better chi-square automatically mean the path is theoretically defensible?")```---# Reporting Standards {#reporting}::: {.callout-note}## Section Overview**What you will learn:** What to report in an SEM study, model reporting paragraph templates, a workflow summary table, and a reporting checklist.:::Reporting SEM results clearly and completely is as important as the analysis itself.---## General principles {-}::: {.callout-note}## What to report in an SEM studyFollowing current best practice [@kline2023principles; @jackson2009reporting; @larsson2021sem]:**Model specification**- The full theoretical rationale for the hypothesised model- Which variables are latent vs. observed; which indicators load onto which factors- Software and estimator used (e.g., "Models were estimated in R using the `lavaan` package [@rosseel2012lavaan] with Maximum Likelihood estimation")**Measurement model (CFA)**- Standardised factor loadings for all indicators (with SEs and significance)- All model fit indices: χ²(df), CFI, TLI, RMSEA (with 90% CI), SRMR- Scale reliabilities (McDonald's ω or Cronbach's α)**Structural model**- Standardised path coefficients (with SEs and significance)- R² for all endogenous variables- Model fit indices**Mediation (if applicable)**- Labelled paths (a, b, c/c'), indirect effect, total effect- Bootstrapped confidence intervals (state number of resamples)- Whether partial or full mediation was found**Model comparisons (if applicable)**- Δχ², Δdf, p-value for nested comparisons- AIC/BIC for non-nested comparisons:::---## Model reporting paragraphs {-}### CFA> A three-factor measurement model was specified *a priori* based on the theoretical framework, with Language Anxiety (*ANX*), Writing Self-Efficacy (*EFF*), and Motivation (*MOT*) each indicated by three Likert-scale items (nine indicators in total). The model was estimated using Maximum Likelihood in R (`lavaan`; @rosseel2012lavaan). Model fit was excellent: χ²(df) = *X.XX*, CFI = .97, TLI = .96, RMSEA = .04 [90% CI: .01, .07], SRMR = .04. All standardised factor loadings were significant and exceeded 0.70 (range: .71–.82), and McDonald's ω exceeded .80 for all three scales, indicating good reliability. The measurement model was retained for subsequent structural analysis.### Full SEM> The structural model specified directional effects of Language Anxiety and Writing Self-Efficacy on Writing Score, and an effect of Self-Efficacy on Motivation. Model fit was acceptable: χ²(df) = *X.XX*, CFI = .96, TLI = .95, RMSEA = .05 [90% CI: .02, .07], SRMR = .05. Writing Self-Efficacy was a significant positive predictor of both Motivation (β = .55, SE = .07, *p* < .001) and Writing Score (β = .47, SE = .08, *p* < .001). Language Anxiety was a significant negative predictor of Writing Score (β = −.38, SE = .07, *p* < .001). Motivation significantly predicted Writing Score (β = .21, SE = .07, *p* = .003). Together, the predictors explained 58% of the variance in Writing Score.### Mediation> To test whether the effect of Writing Self-Efficacy on Writing Score was partially mediated by Motivation, we re-estimated the model with labelled paths and requested 1000 bootstrap resamples for inference on the indirect effect [@fuoli2022sem]. The indirect effect of Self-Efficacy on Writing Score via Motivation was significant (unstandardised *b* = *X.XX*, 95% BCa CI [*X.XX*, *X.XX*]), indicating that part of the positive effect of self-efficacy on writing performance operates through increased motivation. The direct effect of Self-Efficacy on Writing Score remained significant after accounting for this indirect path, supporting **partial mediation**.---## Quick reference: SEM workflow {-}```{r workflow-table, echo=FALSE, message=FALSE, warning=FALSE}data.frame( Step = c( "1. Theoretical specification", "2. Descriptive checks", "3. Confirmatory Factor Analysis", "4. Evaluate measurement fit", "5. Assess reliability", "6. Full SEM", "7. Mediation (if applicable)", "8. Model comparison", "9. Report" ), Action = c( "Draw path diagram; specify which indicators load onto which factors and which structural paths are hypothesised", "Examine distributions (skewness, kurtosis), correlations; check for multivariate outliers", "Fit measurement model with lavaan::cfa()", "Inspect CFI, TLI, RMSEA, SRMR against recommended thresholds", "Compute McDonald's omega with semTools::reliability()", "Add structural paths; fit with lavaan::sem()", "Label paths; define indirect/total effects with ':='; use se = 'bootstrap'", "Use lavTestLRT() for nested models; AIC/BIC for non-nested; consult MIs with theory", "Report all fit indices, standardised loadings, path coefficients, R2, and effect CIs" ), `Key R function(s)` = c( "—", "psych::describe(); cor()", "lavaan::cfa()", "lavaan::fitMeasures()", "semTools::reliability()", "lavaan::sem()", "lavaan::sem(se = 'bootstrap')", "lavaan::lavTestLRT(); AIC(); modindices()", "lavaan::standardizedsolution(); parameterEstimates()" ), check.names = FALSE) |> flextable() |> flextable::set_table_properties(width = .99, layout = "autofit") |> flextable::theme_zebra() |> flextable::fontsize(size = 11) |> flextable::fontsize(size = 11, part = "header") |> flextable::align_text_col(align = "left") |> flextable::set_caption(caption = "Step-by-step SEM workflow with key R functions.") |> flextable::border_outer()```# Citation & Session Info {.unnumbered}::: {.callout-note}## Citation```{r citation-callout, echo=FALSE, results='asis'}cat( params$author, ". ", params$year, ". *", params$title, "*. ", params$institution, ". ", "url: ", params$url, " ", "(Version ", params$version, "), ", "doi: ", params$doi, ".", sep = "")``````{r citation-bibtex, echo=FALSE, results='asis'}key <- paste0( tolower(gsub(" ", "", gsub(",.*", "", params$author))), params$year, tolower(gsub("[^a-zA-Z]", "", strsplit(params$title, " ")[[1]][1])))cat("```\n")cat("@manual{", key, ",\n", sep = "")cat(" author = {", params$author, "},\n", sep = "")cat(" title = {", params$title, "},\n", sep = "")cat(" year = {", params$year, "},\n", sep = "")cat(" note = {", params$url, "},\n", sep = "")cat(" organization = {", params$institution, "},\n", sep = "")cat(" edition = {", params$version, "}\n", sep = "")cat(" doi = {", params$doi, "}\n", sep = "")cat("}\n```\n")```:::```{r fin}sessionInfo()```::: {.callout-note}## AI Transparency StatementThis tutorial was re-developed with the assistance of **Claude** (claude.ai), a large language model created by Anthropic. Claude was used to help revise the tutorial text, structure the instructional content, generate the R code examples, and write the `checkdown` quiz questions and feedback strings. All content was reviewed, edited, and approved by the author (Martin Schweinberger), who takes full responsibility for the accuracy and pedagogical appropriateness of the material. The use of AI assistance is disclosed here in the interest of transparency and in accordance with emerging best practices for AI-assisted academic content creation.:::[Back to top](#intro)[Back to HOME](/index.html)# References {.unnumbered}